Genome-wide Protein-chemical Interaction Prediction

نویسنده

  • Aaron Smalter Hall
چکیده

The analysis of protein-chemical reactions on a large scale is critical to understanding the complex interrelated mechanisms that govern biological life at the cellular level. Chemical proteomics is a new research area aimed at genome-wide screening of such chemical-protein interactions. Traditional approaches to such screening involve in vivo or in vitro experimentation, which while becoming faster with the application of high-throughput screening technologies, remains costly and time-consuming compared to in silico methods. Early in silico methods are dependant on knowing 3D protein structures (docking) or knowing binding information for many chemicals (ligand-based approaches). Typical machine learning approaches follow a global classification approach where a single predictive model is trained for an entire data set, but such an approach is unlikely to generalize well to the protein-chemical interaction space considering its diversity and heterogeneous distribution. In response to the global approach, work on local models has recently emerged to improve generalization across the interaction space by training a series of independant models localized to each predict a single interaction. This work examines current approaches to genome-wide protein-chemical interaction prediction and explores new computational methods based on modifications to the boosting framework for ensemble learning. The methods are described and compared to several competing classification methods. Genome-wide chemicalprotein interaction data sets are acquired from publicly available resources, and a series of experimental studies are performed in order to compare the the performance of each method under a variety of conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Papaya Dieback in Malaysia: A StepTowards A New Insight of Disease Resistance

A recently published article describing the draft genome of Erwiniamallotivora BT-Mardi (1), the causal pathogen of papaya dieback infection in Peninsular Malaysia, hassignificant potential to overcome and reduce the effect of this vulnerable crop (2). The authors found that the draft genome sequenceis approximately 4824 kbp and the G+C content of the genomewas 52-54%, which is very similarto t...

متن کامل

Genome-wide computational prediction of miRNAs in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) revealed target genes involved in pulmonary vasculature and antiviral innate immunity

The current outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)in China threatened humankind worldwide. The coronaviruses contains the largest RNA genome among all other known RNA viruses, therefore the disease etiology can be understood by analyzing the genome sequence of SARS-CoV-2. In this study, we used an ab-intio based computational tool VMir to scan the complete geno...

متن کامل

Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.

Protein-protein interaction plays a critical role in biological processes. The identification of interacting proteins by computational methods can provide new leads in functional studies of uncharacterized proteins without performing extensive experiments. We developed a database for the potentially interacting domain pairs (PID) extracted from a dataset of experimentally identified interacting...

متن کامل

Computational prediction of miRNAs in Nipah virus genome reveals possible interaction with human genes involved in encephalitis

Current re-emergence of Nipah virus (NiV) in India caused 11 deaths so far and many patients were kept in quarantine. A thorough study of previous outbreaks occurred in Malaysia, Bangladesh and India represents cases with high rate of fatality due to acute encephalitis. Our work involves genome analysis of NiV for prediction of miRNAs and their targeted genes in human in order to understand enc...

متن کامل

Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach

The ability to accurately predict the DNA targets and interacting cofactors of transcriptional regulators from genome-wide data can significantly advance our understanding of gene regulatory networks. NKX2-5 is a homeodomain transcription factor that sits high in the cardiac gene regulatory network and is essential for normal heart development. We previously identified genomic targets for NKX2-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011